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DETAILED ACTION 

Specification 

1 . The title of the invention is not descriptive. A new title is required that is cleariy 
indicative of the invention to which the claims are directed. 

The following title is suggested: 

Determining Boundary Location Likelihoods for Speech and Background Noise 
Portions 

Claim Rejections - 35 USC § 102 

2. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
states. 

3. Claims 13, 18, 21, 37, 42, 45, 50, and 52 are rejected under 35 U.S.C. 102(b) as 
being anticipated by Chigier. 

Regarding independent claims 13, 37, 50, and 52, C/?/g/er discloses an 
apparatus, method, computer executable process, and computer executable steps, 
comprising: 

"means for receiving the input signal" - an input speech signal 14 is received 
(column 4, lines 25 to 45: Figure 1); 
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"means for processing the received signal to generate an energy signal indicative 
of the local energy within the received signal" - spectral analyzer 12 performs spectral 
analysis (e.g.. computes a short term Fourier transform) on a window of samples to 
provide a feature vector sequence 16, consisting of a set of parameter coefficients (e.g. 
cepstral coefficients) characteristic of each speech frame (column 4, lines 46 to 59: 
Figure 1 ); cepstral coefficients are "an energy signal indicative of the local energy" 
because they represent a log energy of a speech signal (Figures 2 and 2A); 

"means for determining the likelihood that said boundary is located at each of a 
plurality of possible locations within said energy signal" - a boundary classifier 54 
assigns to each speech frame a probability ("the likelihood") that the speech frames 
corresponds to a boundary between two phonemes (column 6, lines 10 to 24: Figures 3 
and 3A); word boundaries 44 correspond to a case in which an initial sound 50 is 
classified as part of background signal 52 ("background noise containing portion") 
(column 5, line 64 to column 6, line 9: Figures 2 and 2A); 

"means for determining the location of said boundary using said likelihoods 
determined for each of said possible locations" - if a boundary probability assigned to a 
speech frame is greater than a first threshold (e.g., 70%), the frame is assumed to be a 
boundary by a segment generator 56, which generates a network of speech segments 
(A, B, and C); In operation, boundary classifier classifies boundaries I, II, and III in a 
speech frame sequence 59; segment generator 56 produces speech segments A, B, 
and C based on the classified boundaries (column 6, lines 15 to 38: Figures 3 and 3A). 
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Regarding claims 18 and 42, CAi/igr/er discloses spectral analyzer 12 blocks a 
sampled speech signal into frames by placing a "window" over the samples that 
preserves the samples in the time interval of interest (column 4, lines 45 to 50: Figure 
1A). 

Regarding claims 21 and 45, C/7/jgi/er discloses word boundaries 44 correspond 
to a case in which an initial sound 50 is classified as part of background signal 52 (e.g. 
when sound 50 is a typical mouth click or pop produced by opening the lips, prior to 
speaking), and boundaries 46, correspond to a case in which an initial sound is 
classified as part of a word (column 5, line 64 to column 6, line 9: Figures 2 and 2A); 
implicitly, at least a boundary at a beginning of a speech portion is detected. 

Claim Rejections • 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or desCTibed as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 14, 1 5, 22, 38, 39, and 46 are rejected under 35 U.S.C. 1 03(a) as being 
unpatentable over Chigier in view of Cohrs et al. 

Concerning claims 14 and 38, C/j/igf/eA- discloses checking boundary probability 
classifications of one or more frames from either side of frame N (column 6, line 65 to 
column 7, line 1), but omits determining a boundary location by comparing with a model 
representative of energy in background noise and a model representative of energy In 
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speech, and combining results of the comparisons to determine a likelihood for a 
current location. However, Cohrs et al. teaches computation of a similarity measure 
between stored references and parameters extracted from an utterance using hidden 
Markov models (HMMs). Hypothesizer 43 makes two types of hypotheses. The first 
type of hypothesis (referred to as a "background hypothesis") assumes that the feature 
vector sequence includes only background. The second type of hypothesis (referred to 
as a "phrase hypothesis") assumes that the feature sequence includes a command 
word. (Column 4, Line 59 to Column 5, Line 20: Figure 2) Cohrs et al. states there is 
an advantage in using models instead of thresholds for spotting command words by 
avoiding problems associated with false alarm rates for certain users. (Column 1, Lines 
31 to 63) It would have been obvious to one having ordinary skill in the art to determine 
boundaries by comparing to models of background noise and speech as taught by 
Cohrs et al. in the method and apparatus for boundary probability assignment of Chigler 
for the purpose of avoiding problems associated with using thresholds. 

Concerning claims 15 and 39, Cohrs et al. teaches an endpoint detector 42 for 
determining a beginning and end point of utterances embedded in speech, and outputs 
a finite sequence of speech vectors, corresponding to a single utterance; hypothesizer 
43 receives the speech vector sequences output by endpoint detector 42 (column 4, 
lines 53 to 65). 

Concerning claims 22 and 46, Cohrs et al. teaches hidden Markov models 
(HMMs) (column 4, lines 1 to 5), which are statistical models, implicitly. 
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6. Claims 1 6, 1 7, 1 9, 20, 40, 41 , 43 and 44 are rejected under 35 U.S.C. 1 03(a) as 
being unpatentable over Chigier in view of Lennig et al. 

Concerning claims 16, 17, 40, and 41, C/j/igf/e/' omits filtering an energy signal to 
remove energy variations having a frequency below a predetermined frequency, where 
the filter is operable to filter out energy variations below 1 Hz. However, Lennig et al. 
teaches detecting word endpoints, where filter means 12 comprises a filter bank of 
twenty triangular filters spanning a range of about 1 00 Hz to about 4000 Hz. Weights 
Wij for filter channels j are set so that 1/1/^=0 for frequencies Ij below 1 00 Hz. (Column 3, 
Lines 4 to 40: Figure 1 ; Table 1 : Filter No. 1) Thus, all energy variations at frequencies 
in the range between 0 Hz and 100 Hz are removed, including those energy variations 
at frequencies below 1 Hz. Lennig et al. suggests an advantage of reducing an error 
rate for speech recognition. (Column 1 , Lines 19 to 26) It would have been obvious to 
one having ordinary skill in the art to filter an energy signal to remove energy variations 
having a frequency below a predetermined frequency as taught by Lennig et al. in the 
method and apparatus of boundary probability assignment of Chigier for the purpose of 
reducing an error rate for speech recognition. 

Concerning claims 19, 20, 43, and 44, Chigier discloses speech samples 
(column 4, lines 60 to 66), and assigning boundary probabilities based on log energy 
(column 6, lines 10 to 24: Figures 2. 2A, and 3). 
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7. Claims 23 and 47 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Chigier in view of Cohrs et ai as applied to claims 13, 14, 22, 37, 38, and 46 
above, and further In view of Abut et al. 

Cohrs et al. discloses hidden Markov models (HMMs), but omits models based 
on Laplacian statistics. However, Abut et al. discloses speech probability models based 
on Laplacian speech statistics. (II. Speech Statistics: Page 226) It is suggested that 
Laplacian statistics have lower and upper bounds suitable for speech probability 
models. (Page 227) It would have been obvious to one having ordinary skill in the art 
to utilize models based upon Laplacian statistics as suggested by Abut et al. in the 
method and apparatus for boundary probability assignment of Chigier in order to obtain 
suitable speech probability models. 

Claims 24 and 48 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Chigier in view of Cohrs et al. as applied to claims 13, 14, 22, 37, 38, and 46 
above, and further in view of Erell et al. 

Cohrs et al. discloses hidden Markov models (HMMs), but does not expressly 
state that a speech model is an auto-regressive model. However, Erell et al. teaches a 
speech recognition system where the acoustic features are extracted to form a feature 
vector, and where the features are the coefficients of an autoregressive model. Erell et 
al. states that these are the most commonly used features, including linear prediction 
coefficients, cepstrum coefficients, bank of filter energies etc., to reflect vocal tract 
characteristics. (Column 1 , Lines 37 to 45) It would have been obvious to one of 
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ordinary sl<ill in the art to use an auto-regressive model in the method and apparatus for 
boundary probability assignment of Chigier because Erell et al. suggests that an auto- 
regressive model is the most commonly employed method of deriving speech features. 



Conclusion 

8. The prior art made of record and not relied upon is considered pertinent to 
Applicant's disclosure. 

Rees and Gupta et al. disclose related art. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Martin Lerner whose telephone number is (571 ) 272- 
7608. The examiner can normally be reached on 8:30 AM to 6:00 PM Monday to 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
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you have questions on access to ttie Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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Martin Lerner ^ \" 



Examiner 
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